Counterfactual Reasoning and Probabilistic Methods for Trustworthy AI

Go/No Go Meeting 2022

Patrick Altmeyer

Overview

  • Trustworthy AI 🔮 … and how I think about it
  • Looking back 🚩
  • The Road Ahead 🎯
  • Questions ❓

The Problem with Today’s AI

From human to data-driven decision-making …

  • Black-box models like deep neural networks are being deployed virtually everywhere.
  • Includes safety-critical and public domains: health care, autonomous driving, finance, …
  • More likely than not that your loan or employment application is handled by an algorithm.

… where black boxes are recipe for disaster.

  • We have no idea what exactly we’re cooking up …
    • Have you received an automated rejection email? Why didn’t you “mEet tHe sHoRtLisTiNg cRiTeRia”? 🙃
  • … but we do know that some of it is junk.
Figure 1: Adversarial attacks on deep neural networks. Source: Goodfellow, Shlens, and Szegedy (2014)

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Current Standard in ML

We typically want to maximize the likelihood of observing \(\mathcal{D}_n\) under given parameters (Murphy 2022):

\[ \theta^* = \arg \max_{\theta} p(\mathcal{D}_n|\theta) \qquad(1)\]

Compute an MLE (or MAP) point estimate \(\hat\theta = \mathbb{E} \theta^*\) and use plugin approximation for prediction:

\[ p(y|x,\mathcal{D}_n) \approx p(y|x,\hat\theta) \qquad(2)\]

  • In an ideal world we can just use parsimonious and interpretable models like GLM (Rudin 2019), for which in many cases we can rely on asymptotic properties of \(\theta\) to quantify uncertainty.
  • In practice these models often have performance limitations.
  • Black-box models like deep neural networks are popular, but they are also the very opposite of parsimonious.

Objective

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

Objective

. . .

[…] deep neural networks are typically very underspecified by the available data, and […] parameters [therefore] correspond to a diverse variety of compelling explanations for the data. (Wilson 2020)

In this setting it is often crucial to treat models probabilistically!

\[ p(y|x,\mathcal{D}_n) = \int p(y|x,\theta)p(\theta|\mathcal{D}_n)d\theta \qquad(3)\]

Towards Trustworthy AI

Ground Truthing

Probabilistic Models

Counterfactual Reasoning

We can now make predictions – great! But do we know how the predictions are actually being made?

Objective

With the model trained for its task, we are interested in understanding how its predictions change in response to input changes.

\[ \nabla_x p(y|x,\mathcal{D}_n;\hat\theta) \qquad(4)\]

  • Counterfactual reasoning (in this context) boils down to simple questions: what if \(x\) (factual) \(\Rightarrow\) \(x\prime\) (counterfactual)?
  • By strategically perturbing features and checking the model output, we can (begin to) understand how the model makes its decisions.
  • Counterfactual Explanations always have full fidelity by construction (as opposed to surrogate explanations, for example).

. . .

Important to realize that we are keeping \(\hat\theta\) constant!

Some achievements …

  1. Three presentations at JuliaCon 2022:
  2. IEEE SaTML 2022 submission “Endogenous Macrodynamics in Algorithmic Recourse” (under review)
    • The submitted paper can be found here.
    • The code for the companion Julia package can be found here.
  3. Supervision of student Research Project

Explaining Black-Box Models through Counterfactuals

Stable Dev Build Status codecov codecov 90% 90%

CounterfactualExplanations.jl is a package for generating Counterfactual Explanations (CE) and Algorithmic Recourse (AR) for black-box algorithms. Both CE and AR are related tools for explainable artificial intelligence (XAI). While the package is written purely in Julia, it can be used to explain machine learning algorithms developed and trained in other popular programming languages like Python and R. See below for short introduction and other resources or dive straight into the docs.

Turning a nine (9) into a four (4).

A sad 🐱 on its counterfactual path to its cool dog friends.

Effortless Bayesian Deep Learning through Laplace Redux

Stable Dev Build Status codecov codecov 96% 96%

LaplaceRedux.jl (formerly BayesLaplace.jl) is a small package that can be used for effortless Bayesian Deep Learning and Logistic Regression trough Laplace Approximation. It is inspired by this Python library and its companion paper.

Plugin Approximation (left) and Laplace Posterior (right) for simple artificial neural network.

Simulation of changing posteriour predictive distribution. Image by author.

Endogenous Macrodynamics in AR - motivation

TLDR: We find that standard implementation of various SOTA approaches to AR can induce substantial domain and model shifts. We argue that these dynamics indicate that individual recourse generates hidden external costs and provide mitigation strategies.

Description: In this work we investigate what happens if Algorithmic Recourse is actually implemented by a large number of individuals. The chart below illustrates what we mean by Endogenous Macrodynamics in Algorithmic Recourse: (a) we have a simple linear classifier trained for binary classification where samples from the negative class (y=0) are marked in blue and samples of the positive class (y=1) are marked in orange; (b) the implementation of AR for a random subset of individuals leads to a noticable domain shift; (c) as the classifier is retrained we observe a corresponding model shift (Upadhyay, Joshi, and Lakkaraju 2021); (d) as this process is repeated, the decision boundary moves away from the target class.

Proof of concept: repeated implementation of AR leads to domain and model shifts.

We argue that these shifts should be considered as an expected external cost of individual recourse and call for a paradigm shift from individual to collective recourse in these types of situations.

Mitigation strategies.

Results for synthetic data.

Mitigation strategies applied to synthetic data.

LaplaCE: Realistic and Surrogate-Free Counterfactual Explanations

Good VAE

. . .

The results in Figure 2 look great!

Figure 2: Turning a nine (9) into a four (4) using REVISE. It appears that the VAE is well-specified in this case.

Bad VAE

. . .

But things can also go wrong …

The VAE used to generate the counterfactual in Figure 3 is not expressive enough.

Figure 3: Turning a seven (7) into a nine (9) using REVISE with a weak VAE.

. . .

The counterfactual in Figure 4 is also valid … what to do?

Figure 4: Turning a seven (7) into a nine (9) using generic search.

TLDR: Using Linearized Laplace Redux we can compute predictive uncertainty estimates for any neural network (Daxberger et al. 2021). By minimizing predictive uncertainty we can create realistic counterfactual explanations (Schut et al. 2021) without the need for surrogate generative models.

Description: We propose LaplaCE: an effortless way to produce realistic Counterfactual Explanations for deep neural networks (DNN) using Laplace Approximation (LA). To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process (e.g. Joshi et al. (2019)). While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itself to the surrogate generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfill that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion. Recent work on Bayesian Deep Learning has shown that LA can be used effectively in this context. By leveraging this finding we show that it is possible to generate realistic counterfactual explanations, without the need to restrict the class of models or rely on a separate generative model.

Potential venues1: FAccT (Dec ’22), AIES (March ’23), NeurIPS (May ’23), SaTML (Sep ’23)

“Crack my code” - can XAI teach users black-box behaviour?

TLDR: Using a gamified experiment we test if SOTA XAI methods can actually help users to understand the workings of a black-box model.

Description: Do Counterfactual Explanations actually help users to understand the workings of a black-box model? In this work we investigate this question through gamified experiments. The idea is to set up an experiment as follows:

  1. Fit a neural network on a number of synthetic features with two classes and no inherent meaning.
  2. Generate a random sample from any of the two classes and show it to the user.
  3. Let user predict the class and compare it to actual prediction by neural network.
  4. Show user XAI explanation and reward user if their guess matches actual prediction.
  5. Repeat 1-4 many times.

If the XAI method is useful, the discrepancy between user guesses and neural network prediction should diminish over time. This project idea is inspired by an AIES 2022 paper that employs a similar framework (Dai et al. 2022).

Potential venues: AIES (March ’23), JuliaCon (April ’23)

Not your Typical Black-Box: the Dutch Childcare Benefits Scandal through the Lens of Counterfactual Explanations

TLDR: The automated decision-making system used by the Dutch tax authorities is opaque not because of its complexity, but rather by design. The goal of this work is to explore ways to explain such black-boxes through counterfactuals.

Description: The Dutch childcare benefits scandal involved involved false fraud allegations based on an automated decision-making system (ADMS) used by the tax authorities. The ADMS was essentially a collection of spreadsheets containing hard-coded rules. The sheer quantity of spreadsheets has made it difficult for experts (Cynthia) to disentangle the inners workings and hence understand the behaviour of the ADMS. We can think of this a non-conventional black-box that is opaque not because of its complexity, but rather by design. We believe that these types of ADMS are still widely prevalent in industry and therefore should be considered as a different kind of threat to AI integrity. The goal of this work is to explore ways to explain such black-boxes through counterfactuals. This is a challenging and ambitious task, but a few strategies come to mind: 1) use brute force to search counterfactuals; 2) use a Growing Spheres (Laugel et al. 2017) to generate counterfactuals; 3) derive a decision tree from the spreadsheets and generate counterfactuals for the tree.

Potential venues: FAccT 2024

Counterfactual Explanations for Credit Risk Monitoring in Central Banks

Description: This fall I will give a seminar about Counterfactual Explanations and Algorithmic Recourse at the Bank of England. Bank researchers are interested in applying CE and AR to their bank risk prediction models.

Potential venues:

  1. Blog post applying recent findings to sovereign default risk dataset.
  2. Contribution to BoE Staff Working Paper.

Source: Paul Fiedler on Unsplash

Counterfactual Explanations for Regression Problems

Description: The literature on Counterfactual Explanations almost exclusively focuses on classification problems. In Finance and Economics, however, the overwhelming majority of problems involve regression. Hence it is perhaps not altogether surprising that practitioners and researchers in these fields are largely unfamiliar with the CE and instead typically rely on surrogate explanations like LIME and SHAP to explain black-box models. Using Spooner et al. (2021) as a potential starting point, I would be interested in exploring how state-of-the-art CE approaches can be applied to regression problems.

Potential venues: -

Source: Spooner et al. (2021))

Other plans

Priorities

  • Masters student supervision.
  • Contribute and/or participate in TU Delft Summer School on XAI.
  • Proposal for Google Summer of Code.
  • Increased co-operation with ING.

Side projects

  • More blog post implementations of Murphy (2022).
  • Revise master’s work on Deep VAR.

Photo by Ivan Diaz on Unsplash

Questions ❓

References

Dai, Xinyue, Mark T Keane, Laurence Shalloo, Elodie Ruelle, and Ruth MJ Byrne. 2022. “Counterfactual Explanations for Prediction and Diagnosis in Xai.” In, 215–26. https://doi.org/10.1145/3514094.3534144.
Daxberger, Erik, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, and Philipp Hennig. 2021. “Laplace Redux-Effortless Bayesian Deep Learning.” Advances in Neural Information Processing Systems 34.
Goodfellow, Ian J, Jonathon Shlens, and Christian Szegedy. 2014. “Explaining and Harnessing Adversarial Examples.” https://arxiv.org/abs/1412.6572.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.
Laugel, Thibault, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki. 2017. “Inverse Classification for Comparison-Based Interpretability in Machine Learning.” https://arxiv.org/abs/1712.08443.
Murphy, Kevin P. 2022. Probabilistic Machine Learning: An Introduction. MIT Press.
Rudin, Cynthia. 2019. “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.” Nature Machine Intelligence 1 (5): 206–15. https://doi.org/10.1038/s42256-019-0048-x.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Spooner, Thomas, Danial Dervovic, Jason Long, Jon Shepard, Jiahao Chen, and Daniele Magazzeni. 2021. “Counterfactual Explanations for Arbitrary Regression Models.” https://arxiv.org/abs/2106.15212.
Upadhyay, Sohini, Shalmali Joshi, and Himabindu Lakkaraju. 2021. “Towards Robust and Reliable Algorithmic Recourse.” Advances in Neural Information Processing Systems 34: 16926–37.
Wilson, Andrew Gordon. 2020. “The Case for Bayesian Deep Learning.” https://arxiv.org/abs/2001.10995.